Skip to content

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Sep 12, 2025

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Relates ES-12664

@tlrx tlrx added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Sep 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I left some suggestions that I hope to get your input on. I think avoiding the pending merge list is a simplification.

@tlrx
Copy link
Member Author

tlrx commented Sep 19, 2025

@henningandersen Thanks a lot for the feedback, I borrowed your code with minor changes.

I had a couple hundred of successful tests execution locally but I'll continue to run more before merging. This is ready for review.

@tlrx
Copy link
Member Author

tlrx commented Sep 19, 2025

@henningandersen I updated the PR with your feedback, let me know if you have more comments to do.

I'll run more tests locally and on CI just in case, but so far looks green.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lkts
Copy link
Contributor

lkts commented Sep 19, 2025

This should be backported, right?

@henningandersen
Copy link
Contributor

This should be backported, right?

Yes, to 8.19, 9.0, 9.1, IIUC.

@tlrx tlrx merged commit e6d78b0 into elastic:main Sep 22, 2025
33 of 34 checks passed
@tlrx
Copy link
Member Author

tlrx commented Sep 22, 2025

Thanks @henningandersen for the help!

This should be backported, right?

Yes, to 8.19, 9.0, 9.1, IIUC.

Will backport today.

@tlrx tlrx deleted the 2025/09/09/ES-12664 branch September 22, 2025 08:13
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Sep 22, 2025
…the IndexWriter (elastic#134656)

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Relates ES-12664
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Sep 22, 2025
…the IndexWriter (elastic#134656)

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Relates ES-12664
tlrx added a commit that referenced this pull request Sep 22, 2025
…the IndexWriter (#134656) (#135173)

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Relates ES-12664
tlrx added a commit that referenced this pull request Sep 22, 2025
…loses the IndexWriter (#134656) (#135175)

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Backport of #134656 for 9.0.8
Relates ES-12664
tlrx added a commit that referenced this pull request Sep 22, 2025
…closes the IndexWriter (#134656) (#135177)

This change fixes a bug that causes a deadlock in the thread pool merge scheduler when a merge fails due to a tragic event.

The deadlock occurs because Lucene aborts running merges when failing with a tragic event and then waits for them to complete. But those "running" merges might in fact be waiting in the Elasticsearch's thread pool merge scheduler tasks queue, or they might be waiting in the backlogged merge tasks queue because the per-shard concurrent merges count limit has been reached, or they might simply be waiting for enough disk space to be executed. In which cases the merge thread that is failing waits indefinitely.

The proposed fix in this change uses the merge thread that is failing due to a tragic event to abort all other enqueued and backlogged merge tasks of the same shard, before pursuing with the closing of the IndexWriter. This way Lucene won't have to wait for any running merges as they would have all be aborted upfront.

Backport of #134656 for 8.19.5
Relates ES-12664
szybia added a commit to szybia/elasticsearch that referenced this pull request Sep 22, 2025
* upstream/main: (50 commits)
  Disable utf-8 parsing optimization (elastic#135172)
  rest-api-spec: fix master_timeout typo (elastic#135167)
  Fixes countDistinctWithConditions in csv-spec tests (elastic#135097)
  Fix test failure by checking for feature flag (elastic#135174)
  Fix deadlock in ThreadPoolMergeScheduler when a failing merge closes the IndexWriter (elastic#134656)
  Make SecureString comparisons constant time (elastic#135053)
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on mapped geo_point field with no doc values} elastic#135164
  ESQL: Replace function count tests (elastic#134951)
  Mute org.elasticsearch.compute.aggregation.SampleBooleanAggregatorFunctionTests testSimpleWithCranky elastic#135163
  Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null integer values} elastic#135162
  Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null double values} elastic#135159
  TSDB ingest performance: combine routing and tsdb hashing (elastic#132566)
  Mute org.elasticsearch.compute.aggregation.SampleBytesRefAggregatorFunctionTests testSimpleWithCranky elastic#135157
  Mute org.elasticsearch.xpack.logsdb.qa.BulkStoredSourceChallengeRestIT testHistogramAggregation elastic#135156
  Mute org.elasticsearch.xpack.logsdb.qa.StandardVersusStandardReindexedIntoLogsDbChallengeRestIT testHistogramAggregation elastic#135155
  Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testHistogramAggregation elastic#135154
  Mute org.elasticsearch.xpack.logsdb.qa.BulkChallengeRestIT testHistogramAggregation elastic#135153
  Mute org.elasticsearch.discovery.ClusterDisruptionIT testAckedIndexing elastic#117024
  Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testMountSearchableSnapshot {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135151
  Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testSearchableSnapshotUpgrade {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135150
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team v8.19.5 v9.0.8 v9.1.5 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants